# Low VRAM inference
Bielik 4.5B V3.0 Instruct FP8 Dynamic
Apache-2.0
This model is the FP8 quantized version of Bielik-4.5B-v3.0-Instruct, utilizing AutoFP8 technology to quantize weights and activations into FP8 data type, reducing approximately 50% of disk space and GPU memory requirements.
Large Language Model Other
B
speakleash
74
1
Bielik 1.5B V3.0 Instruct FP8 Dynamic
Apache-2.0
This is an FP8 dynamic quantization version based on the Bielik-1.5B-v3.0-Instruct model, adapted for vLLM or SGLang inference frameworks. It uses AutoFP8 quantization technology to reduce parameter bytes from 16-bit to 8-bit, significantly lowering disk space and GPU VRAM requirements.
Large Language Model Other
B
speakleash
31
1
Gemma 3 27b It Qat GGUF
MIT
Gemma-3-27B is a quantized-optimized conversational large language model supporting advanced non-linear quantization techniques, delivering high-quality text generation capabilities.
Large Language Model
G
ubergarm
852
9
Qwq 32B Bnb 4bit
Apache-2.0
4-bit quantized version of QwQ-32B, optimized using Bitsandbytes technology, suitable for efficient inference in resource-constrained environments
Large Language Model
Transformers

Q
onekq-ai
167
2
Cogvideox1.5 5B
Other
CogVideoX is an open-source video generation model similar to Qingying, supporting high-resolution video generation
Text-to-Video English
C
THUDM
11.12k
36
Dorna Llama3 8B Instruct Quantized4Bit
4-bit quantized version of Dorna-Llama3-8B-Instruct, optimized for Persian language with Flash Attention 2 technology for enhanced inference efficiency
Large Language Model
Transformers Supports Multiple Languages

D
amirMohammadi
22
11
Featured Recommended AI Models